Query Planning in the Presence of Overlapping Sources

نویسندگان

  • Jens Bleiholder
  • Samir Khuller
  • Felix Naumann
  • Louiqa Raschid
  • Yao Wu
چکیده

Navigational queries on Web-accessible life science sources pose unique query optimization challenges. The objects in these sources are interconnected to objects in other sources, forming a large and complex graph, and there is an overlap of objects in the sources. Answering a query requires the traversal of multiple alternate paths through these sources. Each path can be associated with the benefit or the cardinality of the target object set (TOS) of objects reached in the result. There is also an evaluation cost of reaching the TOS. We present dual problems in selecting the best set of paths. The first problem is to select a set of paths that satisfy a constraint on the evaluation cost while maximizing the benefit (number of distinct objects in the TOS). The dual problem is to select a set of paths that satisfies a threshold of the TOS benefit with minimal evaluation cost. The two problems can be mapped to the budgeted maximum coverage problem and the maximal set cover with a threshold. To solve these problems, we explore several solutions including greedy heuristics, a randomized search, and a traditional IP/LP formulation with bounds. We perform experiments on a real-world graph of life sciences objects from NCBI and report on the computational overhead of our solutions and their performance compared to the optimal solution.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Resilience-Based Framework for Distributed Generation Planning in Distribution Networks

Events with low probability and high impact, which annually cause high damages, seriously threaten the health of the distribution networks. Hence, more attention to the issue of enhancing network resilience and continuity of power supply, feels more than ever, all over the world. In modern distribution networks, because of the increasing presence of distributed generation resources, an alternat...

متن کامل

Query Planning with Disjunctive Sources

We examine the query planning problem in information integration systems in the presence of sources that contain disjunctive information. We show that datalog, the language of choice for representing query plans in information integration systems, is not sufficiently expressive in this case. We prove that disjunctive datalog with inequality is sufficiently expressive, and present a construction...

متن کامل

Varicella Zoster Virus (VZV) Origin-Dependent Plasmid Replication in the Presence of the Four Overlapping Cosmids Comprising the Complete Genome of VZV

The Varicella-Zoster Virus (VZV) genome contains both cis-acting and trans-acting elements, which are important in viral DNA replication. The cis-acting elements consist of two copies of oriS, and the trans-acting elements are those genes whose products are required for virus DNA replication. It has been shown that each of the seven genes required for ori-dependent DNA synthesis of Herpes Simpl...

متن کامل

PDQ: Proof-driven Query Answering over Web-based Data

The data needed to answer queries is often available through Webbased APIs. Indeed, for a given query there may be many Webbased sources which can be used to answer it, with the sources overlapping in their vocabularies, and differing in their access restrictions (required arguments) and cost. We introduce PDQ (ProofDriven Query Answering), a system for determining a query plan in the presence ...

متن کامل

Optimizing Query Planning with Limited Source Capabilities in the Presence of Inclusion and Functional Dependencies

Information Integration is the problem of providing a uniform access to multiple and heterogeneous data sources. The most common approach to this problem, called Globalas-View, consists in providing a global schema of data in which each relation of such a schema is defined as a view over a set of data sources. Recent works deal with this problem in the case of limited source capabilities, where...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006